Search CORE

5 research outputs found

Learning to Reason with a Scalable Probabilistic Logic

Author: William Yang Wang (5358836)
Publication venue
Publication date: 02/12/2022
Field of study

Learning to reason and understand the world’s knowledge is a fundamental problem in Artificial Intelligence (AI). Traditional symbolic AI methods were popular in the 1980s, when first-order logic rules were mostly handwritten, and reasoning algorithms were built on top of them. In the 90s, more and more researchers became interested in statistical methods that deal with the uncertainty of the data, using probabilistic models. While it is always hypothesized that both the symbolic and statistical approaches are necessary for building intelligent systems, in practice, bridging the two in a combined framework often brings intractability—most probabilistic first-order logics are simply not efficient enough for real-world sized tasks. For example, Markov Logics [83] integrate first-order logics with Markov random field theory, but when mapping the entities in a knowledge base (KB) to the propositional theory (i.e., grounding), the size of the network depends on the number of facts in the KB—i.e., O(nk ) where k is the arity of the predicate, and n is the number of KB constants. In this thesis, we design a new probabilistic logic programming paradigm to address various scalability issues in probabilistic logics. We propose a group of scalable methods for inference, learning, and inducing the structure of probabilistic logics. More specifically, we propose a scalable probabilistic logic called ProPPR [105] to combine the best of the symbolic and statistical worlds. ProPPR can be viewed as a probabilistic version of Prolog, and we associate a feature vector for each clause to learn weights from data. The learned weights are used to control search during inference. ProPPR’s inference scheme is very special: instead of performing potentially intractable global inference, ProPPR uses a provably-correct approximate personalized PageRank to conduct local grounding, whose inference time is independent of the size of the KB. To test ProPPR for large, real-world relational learning problems, we show that it can be used as a recursive relational learning engine [108] for large scale KB inference tasks. Another challenging problem in statistical relational learning is structure learning: learning logic programs from a KB. Prior approach can take up to a month to learn the theory for a small KB [50]. This thesis provides a scalable solution to finding inference rules: we create a higher-level of abstraction, and reduce the unknown structure learning problem to parameter learning in ProPPR. To refine the learning process, we also introduce an iterated structural gradient algorithm for pruning the search space. This thesis also connects structured sparsity in machine learning and predicate invention in inductive logic programming [107]. To improve structure learning and incorporate latent variable modeling, we have also designed a rather radical approach—learning low-dimensional continuous latent embeddings of logical formulas [103]. </p

FigShare

An empirical investigation of sparse log-linear models for improved dialogue act classification

Author: Alexander Rudnicky (4321747)
William Yang Wang (5358836)
Yun-Nung Chen (5362040)
Publication venue
Publication date: 29/06/2018
Field of study

<p>Previous work on dialogue act classification have primarily focused on dense generative and discriminative models. However, since the automatic speech recognition (ASR) outputs are often noisy, dense models might generate biased estimates and overfit to the training data. In this paper, we study sparse modeling approaches to improve dialogue act classification, since the sparse models maintain a compact feature space, which is robust to noise. To test this, we investigate various element-wise frequentist shrinkage models such as lasso, ridge, and elastic net, as well as structured sparsity models and a hierarchical sparsity model that embed the dependency structure and interaction among local features. In our experiments on a real-world dataset, when augmenting N-best word and phone level ASR hypotheses with confusion network features, our best sparse log-linear model obtains a relative improvement of 19.7% over a rule-based baseline, a 3.7% significant improvement over a traditional non-sparse log-linear model, and outperforms a state-of-the-art SVM model by 2.2%.</p

FigShare

Unsupervised induction and filling of semantic slots for spoken dialogue systems using frame-semantic parsing

Author: Alexander Rudnicky (4321747)
William Yang Wang (5358836)
Yun-Nung Chen (5362040)
Publication venue
Publication date: 29/06/2018
Field of study

<p>Spoken dialogue systems typically use predefined semantic slots to parse users' natural language inputs into unified semantic representations. To define the slots, domain experts and professional annotators are often involved, and the cost can be expensive. In this paper, we ask the following question: given a collection of unlabeled raw audios, can we use the frame semantics theory to automatically induce and fill the semantic slots in an unsupervised fashion? To do this, we propose the use of a state-of-the-art frame-semantic parser, and a spectral clustering based slot ranking model that adapts the generic output of the parser to the target semantic space. Empirical experiments on a real-world spoken dialogue dataset show that the automatically induced semantic slots are in line with the reference slots created by domain experts: we observe a mean averaged precision of 69.36% using ASR-transcribed data. Our slot filling evaluations also indicate the promising future of this proposed approach.</p

FigShare

Dependency Parsing for Weibo: An Efficient Probabilistic Logic Programming Approach

Author: Kathryn Mazaitis (3888793)
Lingpeng Kong (5361833)
William W. Cohen (5359088)
William Yang Wang (5358836)
Publication venue
Publication date: 29/06/2018
Field of study

<p>Dependency parsing is a core task in NLP, and it is widely used by many applications such as information extraction, question answering, and machine translation. In the era of social media, a big challenge is that parsers trained on traditional newswire corpora typically suffer from the domain mismatch issue, and thus perform poorly on social media data. We present a new GFL/FUDG-annotated Chinese treebank with more than 18K tokens from Sina Weibo (the Chinese equivalent of Twitter). We formulate the dependency parsing problem as many small and parallelizable arc prediction tasks: for each task, we use a programmable probabilistic firstorder logic to infer the dependency arc of a token in the sentence. In experiments, we show that the proposed model outperforms an off-the-shelf Stanford Chinese parser, as well as a strong MaltParser baseline that is trained on the same in-domain data.</p

FigShare

“Love ya, jerkface”: using Sparse Log-Linear Models to Build Positive (and Impolite) Relationships with Teens

Author: Alan W. Black (5358842)
Amy Ogan (5358839)
Justine Cassell (5358833)
Samantha Finkelstein (5358830)
William Yang Wang (5358836)
Publication venue
Publication date
Field of study

<p>One challenge of implementing spoken dialogue systems for long-term interaction is how to adapt the dialogue as user and system become more familiar. We believe this challenge includes evoking and signaling aspects of long-term relationships such as rapport. For tutoring systems, this may additionally require knowing how relationships are signaled among non-adult users. We therefore investigate conversational strategies used by teenagers in peer tutoring dialogues, and how these strategies function differently among friends or strangers. In particular, we use annotated and automatically extracted linguistic devices to predict impoliteness and positivity in the next turn. To take into account the sparse nature of these features in real data we use models including Lasso, ridge estimator, and elastic net. We evaluate the predictive power of our models under various settings, and compare our sparse models with standard non-sparse solutions. Our experiments demonstrate that our models are more accurate than non-sparse models quantitatively, and that teens use unexpected kinds of language to do relationship work such as signaling rapport, but friends and strangers, tutors and tutees, carry out this work in quite different ways from one another.</p

FigShare